Statistical Machine Translation of Croatian Weather Forecasts: How Much Data Do We Need?

نویسندگان

  • Nikola Ljubesic
  • Petra Bago
  • Damir Boras
چکیده

This research is the first step towards developing a system for translating Croatian weather forecasts into multiple languages. This step deals with the Croatian-English language pair. The parallel corpus consists of a one-year sample of the weather forecasts for the Adriatic, consisting of 7,893 sentence pairs. Evaluation is performed by the automatic evaluation measures BLUE, NIST and METEOR, as well as by manually evaluating a sample of 200 translations. We have shown that with a small-sized training set and the state-of-the art Moses system, decoding can be done with 96% accuracy concerning adequacy and fluency. Additional improvement is expected by increasing the training set size. Finally, the correlation of the recorded evaluation measures is explored.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Peak Sector Occupancy with Two-Hour Convective Weather Forecasts

An important function of traffic flow management ensuring the number of aircraft entering a sector does not exceed the amount that can be safely controlled by the sector controller. One factor that makes this task difficult is the uncertainty of the impact of convective weather, as both the weather forecast and the impact given specific weather is uncertain. In this investigation, we study this...

متن کامل

Speech Recognition of Slovenian and Croatian Weather Forecasts

In the paper we present some results of a joint project in speech data collection and speech recognition of Slovenian and Croatian weather forecasts. In the paper we describe the procedures we have performed in order to obtain a domain specific speech database from broadcast programmes. Additionally the speech recognition experiments are described and some speech recognition results for the Cro...

متن کامل

Evaluating an NLG System using Post-Editing

Computer-generated texts, whether from Natural Language Generation (NLG) or Machine Translation (MT) systems, are often post-edited by humans before being released to users. The frequency and type of post-edits is a measure of how well the system works, and can be used for evaluation. We describe how we have used post-edit data to evaluate SUMTIME-MOUSAM, an NLG system that produces weather for...

متن کامل

A Ridge Moving East across the North Sea This Evening . a Vigorous

In this paper, we describe SUMTIME-METEO, a parallel corpus of naturally occurring weather forecast texts and their corresponding forecast data; data that the human authors inspected while writing the forecast texts. We have analysed the corpus to acquire knowledge needed to build a text generator for automatically producing textual weather forecasts from numerical weather prediction data. Alth...

متن کامل

Exploiting a parallel TEXT - DATA corpus

In this paper, we describe SUMTIME-METEO, a parallel corpus of naturally occurring weather forecast texts and their corresponding forecast data; data that the human authors inspected while writing the forecast texts. We have analysed the corpus to acquire knowledge needed to build a text generator for automatically producing textual weather forecasts from numerical weather prediction data. Alth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CIT

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2010